Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n: remove translated messages when ICU arguments change #9598

Merged
merged 3 commits into from
Aug 23, 2019
Merged

Conversation

brendankenny
Copy link
Member

this came up in the course of writing #9580. If you don't care about an explanation of the status quo, skip to "This PR takes approach 3." :)

when we change a UIString in Lighthouse today:

  • the usual case is we assume existing translations for that message (if any) are better than defaulting back to the english message while waiting for the new translations to come down the pipe, or
  • the string changes so much that it makes sense to change the UIString key for the message, which means there is only the english message for all locales until new translations come through

An example is the recent move to web.dev for the doc links (or grammar changes or rewordings like for themed omnibox). It's very helpful to have the old translations still in place, even if they have a link to slightly older docs in them for a while.

Generally this has seemed to work ok (though we could probably be more cognizant when making these changes), but it does run into a problem when ICU arguments change, since the values passed into str_() won't work for both english and older versions of the message for other locales.

This is what is failing in #9580, where deleting an accidentally-included argument causes the swap-locales test to fail as it tries to use the updated argument on an older string in es that needs the old argument. But even if we don't include the error check from #9580, any non-en-US locale will fail if the old argument values aren't provided.

There are a few possible solutions:

  1. always include the union of needed argument values, as is accidentally being done for third-party-summary.js | displayValue right now
  2. let a test like the one in swap-locale just fail in that instance, forcing the author to manually make corrections
  3. treat any updated message with different ICU arguments as sufficiently different that it's not worth keeping the old translations around any more and delete them.

after talking through these scenarios with @exterkamp, we decided 1 and 2 aren't really tenable as we approach all strings being translated and so any user-visible text change possibly becomes significant, and not all locales for a message land at the same time.

This PR takes approach 3. As noted in the jsdocs:

For every locale LHL file, remove any messages that don't have a matching
message in en-US.json. There is a matching golden message if:

  • there is a golden message with the same message id and
  • that message has the same ICU arguments (by count and argument ids).

If a new en-US.json message is sufficiently different so that existing
translations should no longer be used, it's up to the author to remove them
(e.g. by picking a new message id).

During collect-strings/update:sample-json, after en-US.json is updated, this new script runs through the non-en-US LHL files and deletes any messages that don't fit the above criteria.

Copy link
Member Author

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example new output in the middle of the yarn update:sample-json logfest (from the message changes in this PR)

...
Baking lighthouse-core/lib/i18n/locales/en-US.ctc.json
Baking lighthouse-core/lib/i18n/locales/en-XL.ctc.json
Baked en-US.ctc.json into LHL format.
Baked en-XL.ctc.json into LHL format.

Checking for out-of-date LHL messages...
Removing message
	'lighthouse-core/audits/third-party-summary.js | columnMainThreadTime'
from translations: it is no longer found in Lighthouse.
Removing message
	'lighthouse-core/audits/third-party-summary.js | displayValue'
from translations: its ICU arguments don't match the current version of the message.

  Reading artifacts from disk: ~/lighthouse/lighthouse-core/test/results/artifacts +0ms
  status Analyzing and running audits... +38ms
  status Auditing: Uses HTTPS +4ms
  status Auditing: Redirects HTTP traffic to HTTPS +5ms
...

@@ -116,6 +116,7 @@
"gh-pages": "^2.0.1",
"glob": "^7.1.3",
"idb-keyval": "2.2.0",
"intl-messageformat-parser": "^1.8.1",
Copy link
Member Author

@brendankenny brendankenny Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯\_(ツ)_/¯

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I thought we just removed this as unnecessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I thought we just removed this as unnecessary?

ha, yeah. I mean, it could use new MessageFormat(locale).getAst() but that seemed silly, so back it comes. At least it's no change for what's in node_modules/ :)

for (const [messageId, {message}] of Object.entries(localeLhl)) {
const goldenArgumentIds = goldenLocaleArgumentIds[messageId];
if (!goldenArgumentIds) {
logRemoval(alreadyLoggedCulls, messageId, 'it is no longer found in Lighthouse');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a moderate bonus, old UIStrings that have been removed from Lighthouse are removed from all locales, though this will eventually happen regardless when new tc results come in.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! glad we're starting to do this :) mostly LGTM :)

I do object to the use of cull here though, I think prune/trim/cleanup is more appropriate :)

cull is like selection but we are eliminating obsolete messages not just picking them.

'**/en-XL.json',
];
const globPattern = 'lighthouse-core/lib/i18n/locales/**/+([-a-zA-Z0-9]).json';
const lhRoot = `${__dirname}/../../../`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason it's not path.join'd?

Copy link
Collaborator

@connorjclark connorjclark Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least for the glob pattern, / should always be used https://github.com/isaacs/node-glob#windows

for cwd .... what happens if you do path.resolve('C:/some/file.js') in windows? https://github.com/isaacs/node-glob/blob/master/common.js#L97 (btw, a few lines later they normalize the path)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason it's not path.join'd?

cause this line is all it's doing :)

for cwd .... what happens if you do path.resolve('C:/some/file.js') in windows? https://github.com/isaacs/node-glob/blob/master/common.js#L97 (btw, a few lines later they normalize the path)

it turns 'C:/some/file.js' into 'C:\some\file.js', which then, yeah, glob then changes right back.


it('removes nothing if the messages are exactly the same', () => {
const culledLocale = cullLocale(goldenLhl, goldenLhl);
expect(culledLocale).toEqual(goldenLhl);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we clone one of them to make sure it's not mutating goldenLhl or relying on referential equality or something? :)

@@ -116,6 +116,7 @@
"gh-pages": "^2.0.1",
"glob": "^7.1.3",
"idb-keyval": "2.2.0",
"intl-messageformat-parser": "^1.8.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I thought we just removed this as unnecessary?

@brendankenny
Copy link
Member Author

cull is like selection but we are eliminating obsolete messages not just picking them.

Ha, well I meant as in "to cull the herd", but since it's ambiguous we should probably change it

@patrickhulce
Copy link
Collaborator

Ha, well I meant as in "to cull the herd", but since it's ambiguous we should probably change it

Oh, hahaha. Yeah if we're not talking livestock I immediately go to the picking definition, sorry for raining on the cleverness :)

const localeLhl = require(absoluteLocalePath);
const culledLocale = cullLocale(goldenLocaleArgumentIds, localeLhl, alreadyLoggedCulls);

const stringified = JSON.stringify(culledLocale, null, 2) + '\n';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to that \n.

@brendankenny
Copy link
Member Author

I do object to the use of cull here though, I think prune/trim/cleanup is more appropriate :)

Oh, hahaha. Yeah if we're not talking livestock I immediately go to the picking definition, sorry for raining on the cleverness :)

I'll switch it to prune, so at least it's fun to say like cull

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants